Optimal Decision Trees for Categorical Data via Integer Programming
نویسندگان
چکیده
Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.
منابع مشابه
Optimal Generalized Decision Trees via Integer Programming
Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a novel mixed integer programming formulation to construct optimal decision trees ...
متن کاملALTERNATIVE MIXED INTEGER PROGRAMMING FOR FINDING EFFICIENT BCC UNIT
Data Envelopment Analysis (DEA) cannot provide adequate discrimination among efficient decision making units (DMUs). To discriminate these efficient DMUs is an interesting research subject. The purpose of this paper is to develop the mix integer linear model which was proposed by Foroughi (Foroughi A.A. A new mixed integer linear model for selecting the best decision making units in data envelo...
متن کاملA Two Stage Stochastic Programming Model of the Price Decision Problem in the Dual-channel Closed-loop Supply Chain
In this paper, we propose a new model for designing integrated forward/reverse logistics based on pricing policy in direct and indirect sales channel. The proposed model includes producers, disposal center, distributers and final customers. We assumed that the location of final customers is fixed. First, a deterministic mixed integer linear programming model is developed for integrated logistic...
متن کاملA Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study
A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...
متن کاملI. INTRODUCTION ECISION tree induction is a popular method for mining knowledge from data by means of decision tree building
In decision analysis, decision trees are commonly used as a visual support tool for identifying the best strategy that is most likely to reach a desired goal. A decision tree is a hierarchical structure normally represented as a tree-like graph model. The tree consists of decision nodes, splitting paths based on the values of a decision node, and sink nodes representing final decisions. In data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018